AmpNet - A Highly Available Cluster Interconnection Network

نویسندگان

  • Amy W. Apon
  • Larry Wilbur
چکیده

One of the most important challenges facing computing clusters in the foreseeable future is providing fault tolerant, high availability cluster hardware for non-stop applications. This capability is in addition to high throughput and low latency. This paper presents the Advanced MultiProcessor Network (AmpNet), a gigabit speed cluster interconnect that was designed with these issues in mind. The AmpNet Network Interface Card (NIC) uses network-shared memory as network cache to provide a fault-tolerant, self-healing network with no data loss. Higher-level network centric services use networkshared memory to ensure high availability and continuity of service in applications. In addition, the programmable NIC, with low-latency messaging protocols and field upgradeable soft logic, provides a foundation for researchers who would like to develop additional cluster services and protocols for network centric computing. This paper describes the fault tolerant design and implementation of the AmpNet hardware architecture.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Performance Analysis of a New Neural Network for Routing in Mesh Interconnection Networks

Routing is one of the basic parts of a message passing multiprocessor system. The routing procedure has a great impact on the efficiency of a system. Neural algorithms that are currently in use for computer networks require a large number of neurons. If a specific topology of a multiprocessor network is considered, the number of neurons can be reduced. In this paper a new recurrent neural ne...

متن کامل

Performance Analysis of a New Neural Network for Routing in Mesh Interconnection Networks

Routing is one of the basic parts of a message passing multiprocessor system. The routing procedure has a great impact on the efficiency of a system. Neural algorithms that are currently in use for computer networks require a large number of neurons. If a specific topology of a multiprocessor network is considered, the number of neurons can be reduced. In this paper a new recurrent neural ne...

متن کامل

Computing in the RAIN: a reliable array of independent nodes - Parallel and Distributed Systems, IEEE Transactions on

ÐThe RAIN project is a research collaboration between Caltech and NASA-JPL on distributed computing and data storage systems for future spaceborne missions. The goal of the project is to identify and develop key building blocks for reliable distributed systems built with inexpensive off-the-shelf components. The RAIN platform consists of a heterogeneous cluster of computing and/or storage nodes...

متن کامل

Memory Hierarchy Considerations for Cost-eeective Cluster Computing

Using oo-the-shelf commodity workstations and PCs to build a cluster for parallel computing has become a common practice. The cost-eeectiveness of a cluster computing platform for a given budget and for certain types of applications is mainly determined by its memory hierarchy and the interconnection network conngurations of the cluster. Finding such a cost-eeective solution from exhaustive sim...

متن کامل

Performance Analysis of Interconnection Networks for Multi-cluster Systems

With the current popularity of cluster computing systems, it is increasingly important to understand the capabilities and potential performance of various interconnection networks. In this paper, we propose an analytical model for studying the capabilities and potential performance of interconnection networks for multi-cluster systems. The model takes into account stochastic quantities as well ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003